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Abstract. Recent work of Gowers [10] and Nagle, Rodl, Schacht, and Skokan 
[15], [19], [20] has established a hypergraph removal lemma, which in turn im¬ 
plies some results of Szemeredi [26] and Furstenberg-Katznelson [7] concerning 
one-dimensional and multi-dimensional arithmetic progressions respectively. 
In this paper we shall give a self-contained proof of this hypergraph removal 
lemma. In fact we prove a slight strengthening of the result, which we will use 
in a subsequent paper [29] to establish (among other things) infinitely many 
constellations of a prescribed shape in the Gaussian primes. 


1. Introduction 

In this paper we prove a slight variant of the hypergraph removal lemma established 
recently and independently by Gowers [10] and Nagle, Rodl, Schacht and Skokan 
[15], [19], [20]. To motivate this lemma, let us first recall the more well-known 
triangle removal lemma from graph theory of Ruzsa and Szemeredi [22]. It will 
be convenient to work in the setting of tripartite graphs, though we will comment 
about the generalization to general graphs shortly. We adopt the following o() and 
0{) notation: If x,yi,... ,yn are parameters, we use ,y^iX) fo denote 

any quantity bounded in magnitude by Xc{x,yi,... ,yn), where c() is a function 
which goes to zero as a: —> 0 for each fixed choice of j/i,... ,yn- Similarly, we use 
Oyi,... ,yn(X) to denote any quantity bounded by XC{yi ,... , j/„), for some function 
C{) of yi,... , yn- If A is a finite set, we use |A| to denote the cardinality of A. 

Theorem 1.1 (Triangle removal lemma, tripartite graph version). [22] Let Vi, V 2 , V 3 
be finite non-empty sets of vertices, and let G = {Vi, V 2 , V 3 , E 12 , E 23 , E 31 ) be a tri¬ 
partite graph on these sets of vertices, thus Eij C V) x V) for ij = 12,23,31. 
Suppose that the number of triangles in this graph does not exceed <5|Vi||V2||k3| for 
some 0 < <5 < 1. Then there exists a graph G' = G'(Vi, V 2 ,£' 12 : ^ 23 ; -® 3 i) which 
contains no triangles whatsoever, and such that \Eij\ES\ = os^o{\Vi x V,|) for 
ij = 12,23,31. 


One can view G' as a “triangle-free approximation” to G. Note that we do not 
assume that G' is a subgraph of G, but one can easily obtain this conclusion by 
replacing Eh with Eh n Eij if desired (i.e. one replaces G' by G' O G). As we shall 
see, however, it will be convenient to allow the possibility that G' is not a subgraph 
of G. 

Remark 1.2. The above theorem is phrased for tri-partite graphs, but it quickly 
implies an analogous version for non-partite graphs G = {V,E), by taking three 
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copies Vi = V 2 = V 3 = V of the vertex set V, and constructing the bipartite graph 
G = {Vi,V 2 ,V 3 , Ei 2 t E 23 , E 31 ), where Eij consists of those pairs {x,y) which are 
the endpoints of an edge in E. We omit the details. 


It was observed in [22] that Theorem 1.1 implies Roth’s famous theorem [21] that 
subsets of integers of positive density contain infinitely many progressions of length 
three. In [24] it was also observed that Theorem 1.1 also implies that subsets of 
with positive density contain infinitely many right-angled triangles (a result 
first obtained in [1]). It was observed earlier (for instance in [16] or [5]) that 
an extension of the triangle removal lemma to hypergraphs would similarly imply 
Szemeredi’s famous theorem [26] on progressions of arbitrary length; by modifying 
the observation in [24], it would also imply a multidimensional extension of that 
theorem due to Furstenberg and Katznelson [7]. We shall return to this issue in 
the sequel [29] to this paper, and discuss the above hypergraph removal lemma in 
detail later in this introduction. 

Theorem 1.1 was proven using the Szemeredi regularity lemma (see e.g. [27], [14] for 
a survey of this lemma and its applications), which roughly speaking allows one to 
approximate an arbitrary large and complex graph to arbitrary accuracy by a much 
simpler object; see also [32], [23] for further refinements of Theorem 1.1. This proof 
in fact yields a little bit more information on the triangle-free approximation G' to 
G, namely that G' can be chosen to be “bounded complexity”. More precisely: 

Theorem 1.3 (Strong triangle removal lemma, tripartite graph version). [22] Let 
Vi,V2,V3 be finite non-empty sets of vertiees, and let G = (Ri, V2) ^3, £'i2j £'23, £-31) 
he a tri-partite graph on these sets of vertices. Suppose that G eontains at most 
lull'dIIV2I[Vs] triangles. Then there exists a graph G' = G'(Vi, V2, F3, £[2, £23, £31) 
which contains no triangles whatsoever, and such that j£y\£b] = os^o{\Vi x ^1) 
for ij = 12, 23,31. Furthermore, there exists a quantity M = Os{l), and partitions 
Vi = Vi^i U ... Vi^M for each i = 1,2,3 into sets Vi^a (some of which may be empty) 
such that for each ij = 12,23, 31, £b is the union of sets of the form Vt^a x Vj^b- 

Note that the graph G' constructed in Theorem 1.3 will typically not be a subgraph 
of G. One could make the sets Vi^i,... , Vi^m to be the same size (with at most 
one exception for each i) without much difficulty but we will not endeavour to do 
so here. There is also a version of this lemma for non-tripartite graphs which is 
well known (and essentially equivalent to the tripartite version) but we will not 
reproduce it here. 

It turns out that Theorem 1.1 and Theorem 1.3 can be rephrased in a more “prob¬ 
abilistic” manner. One reason for doing this is because in our arguments we will 
need two basic concepts from probability theory, which are conditional expectation 
and complexity respectively. It seems that with the aid of these concepts, the proofs 
become somewhat cleaner to give^. To explain these concepts we need some no¬ 
tation. For reasons which will become clearer later, we shall use a rather general 
notation which incorporates the above Theorems as a special case. 


^For a more traditional combinatorial approach to these problems, see [17]. 



HYPERGRAPH REMOVAL LEMMA 


3 


Definition 1.4 (Hypergraphs). If J is a finite set and d > 0, we define ()J) := {e C 
J : |e| = d} to be the set of all subsets of J of cardinality d. A d-uniform hypergraph 
on J is then defined to be any subset Hd C of ((^). For instance, an undirected 
graph G = {V, E) without loops can be viewed as a 2-uniform hypergraph on V. 

Example 1.5. If J := {1, 2,3}, then the triangle H 2 := ( 2 ) = {{1, 2}, {2,3}, {3,1}} 
is a 2-uniform hypergraph on J. 

Definition 1.6 (Hypergraph systems). A hypergraph system is a quadruplet V = 
{J,{Vj)j^j,d,Hd), where J is a finite set, {Vj)j^j is a collection of finite non¬ 
empty sets indexed by J, d > 1 is positive integer, and Eld Q (d) is a d-uniform 
hypergraph. For any e C J, we set 14 := IljeeL’ T^e '■ Vj ^ 14 be the 

canonical projection map. 

Remark 1.7. Very roughly speaking, a hypergraph system corresponds to the notion 
of a measure-preserving system^ in ergodic theory, though with the notable differ¬ 
ence that no analogue of the shift operator exists in a hypergraph system. Indeed 
the 14 are simply finite sets, and need not have any additive structure whatsoever. 

Definition 1.8 (Conditional expectation). Let V = {J,{Vj)j^j,d,E[d) be a hy¬ 
pergraph system. If / : Vj ^ R is a function, we define the expectation E(/) = 
'E{f(x)\x G Vj) by the formula 

E(/) = E{fix)\x G Vj) ^ E /(^)- 

' xeVj 

Similarly, if H is a cr-algebra^ on Vj, i.e. a collection of sets in Vj which contains 0 
and Vj, and is closed under unions, intersections, and complementation, we define 
the conditional expectation E(/|H) : Vj R by the formula 

yetSix) 

where B{x) is the smallest element of B which contains x. For each e C J, let Ae 
be the cr-algebra on Vj defined by Ae '■= {irE^iE) : E C Vg}- In other words, Ae 
consists of those subsets of Vj, membership of which is determined solely by the 
co-ordinates of Vj indexed by e. 


measure preserving system is a probability space (X, B, p) together with a shift T : X X 
that preserves the measure (i. The ergodic approach to Szemeredi’s theorem, as introduced by 
Furstenberg[6], recasts the problem of finding arithmetic progressions as that of understanding 
averages such as liminf n^oo ^ T'^A n ... fl A). This can in turn be viewed 

as the problem of understanding shift operators such as (T,on a product space 
X X ... X X . This has some intriguing parallels with the combinatorial approach, in which the 
problem of obtaining arithmetic progressions in a set V is reduced to that of analyzing Cayley- 
type graphs or hypergraphs, which can be viewed as subsets of V x ... x V. We do not know 
of any formal connection between these two approaches, nevertheless there do appear to be some 
interesting similarities. 

^Of course, since Vj is finite, we do not need to distinguish finite unions and countable unions, 
and could simply call B an “algebra”, or even a “partition”; the latter notation is in fact used 
in most treatments of the regularity lemma. However we prefer the notation of cr-algebra as 
being highly suggestive, evoking ideas and insights from probability theory, measure theory, and 
information theory. 



4 


TERENCE TAO 


One can interpret the usage of these averages as imposing the uniform probability 
distribution on each Ve^ which basically amounts to introducing a set of 

independent random variables, with each Xj ranging uniformly in Vj. 

If Bi and B 2 are two cr-algebras on Vj, we use Bi V B 2 to denote the smallest cr- 
algebra that contains both Bi and B 2 ', this corresponds to the familiar concept of 
the common refinement of two partitions. We can more generally define Vie/ 
for any collection {Bi)i^i of cr-algebras. 

Example 1.9. For any finite non-empty sets Vi, V 2 , V 3 , the quadruplet V = (J, {Vj)j^j, 2, 
is a hypergraph system, where J := {1,2,3} and H 2 := ( 2 ) are as in Example 1.5. 

The cr-algebra ^{ 1 , 2 } is the algebra of all subsets of Ei x V 2 x F 3 which do not 
depend on the third variable, and thus take the form E XV 3 for some E C Vi x ¥ 2 - 
Similarly for - 4 { 2 , 3 } and 

Definition 1.10 (Complexity). Let V = {J, {Vj)j^j,d, Hd) be a hypergraph sys¬ 
tem. If ,8 is a cr-algebra in Vj, we define the complexity complex( 8 ) of B to be the 
least number of sets in Vj needed to generate 8 as a cr-algebra; this can be viewed 
as a simplified version of the Shannon entropy H(8), which we will not use here. 

We observe the obvious inequalities 

complex( 8 i V 82 ) < complex( 8 i) -I- complex( 82 ) for arbitrary 81,82 (1) 

and 

, , ocomplex(e) ^ , 

\B\<2^ ^ . (2) 

Remark 1.11. If one views 8 as a partition, the complexity is essentially the loga¬ 
rithm of the number of cells in the partition. From an information-theoretic per¬ 
spective, the complexity measures how many bits of information are needed to know 
which atom of 8 a given point in Vj lies in. 


If 8 is a subset of Vj, we let 1^; : Vj —> R be the indicator function, thus l£;(a;) := 1 
when X € E and 1e{x) := 0 otherwise. In particular, E(l£;) = |8|/|Vj| can be 
viewed as the “density” or “probability” of 8 in Vj. 

With all this notation, Theorem 1.3 becomes 

Theorem 1.12 (Strong triangle removal lemma, cr-algebra version). Let V = 
Eld) be a hypergraph system with J = {1,2,3}, d = 2, and Eld = 
((^) = {{1, 2}, {2, 3}, {3,1}}. For each e S Eld, let Eg he a set in Ae such that 

E( n ^ 

ee//d 

for some 0 < <5 < 1. Then there exist sets 8' G Ae for e G Eld such that 

n 

eGHd 

and 

E(l£;e\EV = O5^o(l) for all e G Hd- 

Furthermore, for each i G J there exists sub-algebras Bi C A^iy such that 

complex( 8 i) = 05 ( 1 ) for i G J 
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and 

K&y for e e Hd. 

i^e 


It is easy to see that Theorem 1.3 and Theorem 1.12 are equivalent. The notation 
here may appear quite cumbersome, but the advantages of these notations will 
hopefully become more apparent when we prove a generalization of this result 
shortly. 

The case of d = 2, and J and Hd arbitrary, was treated in [3]. It was then con¬ 
jectured in that paper that a result of the above type should also hold for higher 
d. The generalization of Theorem 1.1 to the higher d case was accomplished only 
recently and independently by Gowers [11] and Nagle, Rodl, Schacht, Skokan [15], 

[19], [20], using the language of hypergraphs. It turns out that Theorem 1.3 or The¬ 
orem 1.12 can similarly be generalized, and with the notation already developed, 
the extension is very easy to state: 

Theorem 1.13 (Hypergraph removal lemma). [11], [15], [19], [20] LetV = {J,{Vj)j^j,d,Hd) 
be a hypergraph system. For each e G H, let he a set in Ae such that 

E( n ^ (3) 

eeHd 

for some 0 < <5 < 1. Then for each e G Hd there exists a set E'^ G Ae such that 

n E'=0 (4) 


E(1e,\£;') = 05 ^ 0 ;j(l) M all e G Hd. (5) 

Furthermore, there exist sub-algebras Be' C Ae' whenever e' C J and je'j < d 
obeying the complexity estimate 

complex(He') = Oj, 5 (l) whenever e' C J and je'j < d (6) 

(so in particular \Be'\ = Oj^si^), thanks to (2)) and 

E(g \J Be' for all e G Hd. (7) 

e'Ce 


Clearly Theorem 1.12 is a special case of Theorem 1.13. We have attributed this 
theorem to Gowers [11] and Nagle-Rodl-Schacht-Skokan [15], [19], [20] because it 
follows from their methods, although a theorem of this type is not stated explicitly 
in those papers. One can formulate variants of this removal lemma in the case 
when Hd is not d-uniform but we will not do so here. A related result has recently 
been obtained in [17], using techniques similar in spirit to those here (though with 
substantially different notation). 

The main purpose of this paper is to explicitly prove Theorem 1.13 in a completely 
self-contained manner. In a subsequent paper [29], we will then transfer this theo¬ 
rem (as in [12]) to obtain a relative version of Theorem 1.13, restricted to a suitably 
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pseudorandom subset of Vj- This will then be used (again following [12]) to de¬ 
duce the existence of infinitely many constellations of a prescribed shape in the 
Gaussian primes and similar sets. 

As a corollary of Theorem 1.13, we obtain the hypergraph removal lemma in a 
formulation closer to that of Gowers or Nagle-Rodl-Schacht-Skokan: 

Corollary 1.14 (Hypergraph removal lemma, partite hypergraph version). [11], 
[15],[19], [20] Let be a collection of finite non-empty sets. Let 0 < d < j Jj, 

and let Hd C (^) be a d-uniform hypergraph on J. For each e G Hd, let Eg be a 
subset o/riiee Vj. Suppose that 

S n ^ for all e G Hd}\ < 

IGJ jeJ 

for some 0 < S < 1; in other words, the J-partite hypergraph G = {{Vj)j^j, {Ee)e<^Hj) 
contains at most S rijeJ 1^ j copies of Hd- Then for each e G Hd there exists 
Yijee ^ such that 

{{xj)jej G : {xj)j(ze G A' for all e G Hd] = 0 

3&J 

(i.e. the J-partite hypergraph G' = G'{{Vj)j^j, (K) eeHd) contains no copies of Hd 
whatsoever), and such that \Ee\E'J = O5^o;| J| (Ojee 1^11) Z®’’ ® ^ 

The deduction of Gorollary 1.14 from Theorem 1.13 is analogous to the deduction 
of Theorem 1.1 from Theorem 1.12 and is omitted. It seems quite likely that we can 
obtain similar analogues for non-partite hypergraphs, just as was the case with the 
non-partite version of Theorem 1.1; see [11], [15], [19], [20] for some examples of this, 
though for applications to Szemeredi-type theorems it is the partite version which is 
of importance. It should be unsurprising that Theorem 1.1 is then the special case 
of Gorollary 1.14 applied to the (hyper)graph in Example 1.5. The case |Jj = 4 
and H^ = (g) was treated in [5]. Just as Theorem 1.1 implies Roth’s theorem, 
Gorollary 1.14 implies Szemeredi’s theorem [26] on arithmetic progressions, as well 
as the multidimensional generalization of that theorem due to Furstenberg and 
Katznelson [7]; see [25], [5], [11], [20] for further discussion^. Thus this paper 
provides a moderately short and self-contained proof of these theorems, although 
we emphasize that this goal was already achieved in the prior work of [11], [15], 
[19], [20]. 

The remainder of this paper is devoted to proving Theorem 1.13. As one might 
expect from the previous proofs of these types of results, our proof shall proceed 
by proving a “hypergraph regularity lemma” and a “hypergraph counting lemma”. 
The arguments are broadly along similar lines to those of Gowers or Nagle, Rodl, 
Schacht, and Skokan, although it seems that using the notation of cr-algebras and 
probability theory allows for slightly cleaner arguments. 


^It was also recently observed that this hypergraph removal result also implies another theorem 
of Furstenberg and Katznelson [8] on affine subspaces of dense subsets of high-dimensional finite 
field vector spaces; see [18]. 
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2. Pseudorandomness and the regularity lemma 

Henceforth the hypergraph system V = (J, (Vj)jej, d, Hd) will be fixed. In this sec¬ 
tion we shall state and prove a cr-algebra version of the hypergraph regularity lemma 
(Lemma 2.9). This lemma establishes a dichotomy between pseudorandomness (or 
^-regularity, or small discrepancy) on one hand, and bounded complexity^ on the 
other; the regularity lemma then asserts, very roughly speaking, that any given set 
or (T-algebra (or family of cr-algebras) can be split into a component with bounded 
complexity, and a component which is pseudorandom (has small discrepancy). 

In order to state the regularity lemma we need to formalize the notion of pseudo¬ 
randomness (or more precisely, of discrepancy). We shall also need a notion of the 
energy of a a-algebra in order to keep track of the inductions that go into the proof 
of the regularity lemma, and also in the final statement of our regularity lemma. 

We shall not state the final regularity lemma we need (Lemma 2.9) immediately. 
To begin with, we set out our notation for discrepancy and energy. Initially we shall 
be focusing primarily on a single edge e C J, as opposed to an entire hypergraph 
iLd, though this hypergraph shall emerge later in this section. 

Definition 2.1 (e-discrepancy). For any e C J, we define the skeleton de of e to 
be the set {/ C e : |/| = jej — 1}. If e C J, Fig C Vj, and H is a cr-algebra on Vj, 
we define the e-discrepancy Ae{Ee\B) of the set Ee with respect to the cr-algebra B 
to be the quantity® 

A,iE,\B):= sup |E((1£;,-E(1bJH)) n (8) 

EjdAs'ifade 

where the supremum is over all collections of sets {Ef)f^ge, where each Ef lies in 
the cr-algebra Af. Note that since Vj is finite, so is Ae{Ee\B). 


Roughly speaking, the e-discrepancy Ae{Ee\B) measures the amount of “structure” 
in Eg which is not already captured by the cr-algebra B. By “structure”, we mean 
sets which can be easily described by sets from the lower order cr-algebras .4/, as 
opposed to a generic set in Ae which in general is likely to have no good decom¬ 
position (or approximate decomposition) into sets from the A/. Thus if Ae{Ee\B) 
is small, we expect Eg to behave randomly (i.e. in an unstructured way) on most 

®This is very similar to the dichotomy between weak mixing and compactness in ergodic theory, 
which is of great utility in proving statements such as Szemeredi’s theorem; it seems of interest 
to explore these connections further. 

®This quantity is related to the Gowers uniformity norms used for instance in [10], [11], [12], 
but we will not explicitly introduce those norms here. This quantity is also related to the notion 
of a pseudorandom hypergraph, studied for instance in [13]. 
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atoms of B. The Ae{Ee\B) generalize the concept of ^-regularity, as the following 
example shows: 

Example 2.2. Let G = (14, V 2 , be a bipartite graph between two finite non¬ 
empty sets 14 , V 2 ; we can thus view E12 as a set in ^{ 1 , 2 }) where V is the hypergraph 
system V = (J, (Vj)je j, d, iJd) with J = {1,2}, d = 2, and Hd = (;J) = {{1,2}}. 
Suppose that E 12 has density E(l£;j 2 ) = a (i.e. a = IE 12 I/IV 1 IIV 2 I), and that 

^{1,2}{Ei2\A$) < £ 

for some e > 0. Then by definition we have 

|E((l£;i 2 - o-)l£;i 1 ^ 2)1 < ^ whenever Ei G A{i},E 2 G ^{ 2 }. 

In the original setting of the bipartite graph G, this is equivalent to asserting that 

\\E^2r\{E^xE2)\-a\Ei\\E2\\ <e|14||14| 

for all El C 14 and E 2 C V 2 . The reader may recognize this as a pseudorandomness 
condition or e-regularity condition on the graph G. If we replace A^ by a finer cr- 
algebra such as Si VS 2 for some Si C ^{ 1 } and S 2 C ^{ 2 } 7 where the complexity of 
Si and S 2 is small compared to 1/e, then a condition such as A{i 2 }(Si 2 |SiVS 2 ) < e 
states, roughly speaking, that the graph G is e-regular on “most” of the atoms 
Ai X A2 in the partition associated to Si V S 2 . 


If S is a (T-algebra on Vj and S is a set in Vj (not necessarily in S), we define the 
E-energy of S to be the quantity 

£e{B) :=E(|E(lE|S)n. 

Clearly, the S-energy £e{B) ranges between 0 and 1; intuitively, £e{B) is a measure 
of how much information about E is captured by S, and is thus in many ways 
complementary to the e-discrepancy Ae(S|S). From Pythagoras’ theorem we can 
verify the identity 

£e{B') = £e{B) + E(|E(1e|S') - E(l£;|S)n whenever S C S', (9) 

thus finer cr-algebras have larger S-energy. 

Remark 2.3. In the setting of Example 2.2 with S = Si V S 2 for some Si C ^|i} 
and S 2 C ^{ 2 }, the energy is a familiar quantity in the theory of the regularity 
lemma, and is usually referred to as the index of the partition; see [27]. 


Let us informally say that a set E^ G Ae is e-pseudorandom with respect to B if the 
e-discrepancy Ae(Se|S) is small. A fundamental fact (which was already exploited 
in [26], [27]) is that if E is not e-pseudorandom with respect to S, then we can find 
a refinement of S with higher energy and not much larger complexity: 

Lemma 2.4 (Large discrepancy implies energy increment). Let e C J, let E^ G Ae 
he a set, and for each f € de let Bf C Af be a a-algebra such that 

Ae(Sel \J Bf)>e 

fede 
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for some e > 0. Then there exists a cr-algehra BfCB'jff Af for all f G de such 


that 




complex(;B)r) < complex(;B/) -I- 1 

(10) 

and 




£eA\/ B'f)>£EA\l Bf)Ee\ 

(11) 


/e9e /eOe 


Proof By (8) (and the finiteness of Vj) we can find sets Ef G Af for all f G de 
such that 

|E [ (1£;,-E(1bJ V Bf)) n I >£• 

V fede fede ) 

For each / e de, let B'^ be the cr-algebra 

B'f := Bf\/B{Ef) 

then we have Bf C B'^ C Af, and obtain (10) from (1). Since YlfeOe is mea¬ 
surable with respect to V/e 3 e ^'f^ ~ E(1 e^ | V/e 3 e ^’f) conditional 

expectation with respect to V/eae ^'f 

E [ (1b, - E(l£;J V B'f)) n I = 0 

\ fade fede 


and hence 


iE I (e(ibj V ^/)-e(iej V ^/)) n 1^, 

/GOe 




By the boundedness of Ylfede ^Ef and the Cauchy-Schwarz inequality we conclude 


E |E(1bJ V ^/)-e(1eJ V ^/)l 

\ fade fede 

and (11) then follows from (9). 


>e", 


By iterating Lemma 2.4, one expects to be able to show that any given set Ee G Ae 
must be e-pseudorandom with respect to a cr-algebra B of bounded complexity, 
since otherwise we could create a tower of cr-algebras whose energy increments 
indefinitely. Such statements can be viewed as cr-algebra analogues of the Szemeredi 
regularity lemma. There are several such lemmas available; the final lemma which 
we need is a bit lengthy to state, so we begin by stating some simpler regularity 
lemmas which we will then iterate to obtain the stronger lemmas which we need. We 
first obtain a preliminary iteration of Lemma 2.4, in which the single set Eg G Ag 
is replaced by an ensemble of sets, or more precisely an ensemble {Be)eeH of cr- 
algebras with bounded complexity. 

If Eld is a d-uniform hypergraph, we define dHd to be the (d—l)-uniform hypergraph 

dHd := [JeeH, ^e. 
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Lemma 2.5 (Dichotomy between randomness and structure). LetV = (J, (hj)jgj, d, 
be a hypergraph system. For each e S Hd, let Be C Ae be a a-algebra with the com¬ 
plexity bounds 

complex(;Be) < m for all e € Hd 

for some m > 0, and for each f S dHd, let Bf C Af be a a-algebra with the 
complexity bounds 

complex(;B/) < M for all f G dHd 

for some M > 0. Let £, d > 0. Then one of the following statements must hold. 

• (Randomness) There exists a-algebras Bf Q B) C Af for all f G dHd such 
that 

£eA\/ B'f)<£EA\/ Bf) -\- for all e G Hd and Ee G Be (12) 

f£de f&de 

and 

Ae{Ee\ \J B'f) < 5 for all e G Hd and Ee G Be. (13) 

/GOe 

• (Structure) There exist a-algebras B f G B) G Af for all f G dHd such that 

V V some e G Hd and Ee G Be (14) 

f£de f£de 

and 

complex(S).) < M + 0| ji,m,£, 5 (l) for all f G dHd. (15) 

Proof We run the following algorithm: 

• Step 0. Initialize B) := Bf for all / S dHd. Note that (12) and (15) 
currently hold. 

• Step 1. If (13) holds, then we halt the algorithm (we are in the “random¬ 
ness” half of the dichotomy). Otherwise, there exists an e S id and Ee G Be 
such that 

AeiEel V B'f) > S. 

fede 

We can then invoke Lemma 2.4 to locate refinements B'j: C B'j G Af for all 
/ G dHd (note that B'j will just equal B'j if f A e) such that 

complex(;B^') < complex(,B^) -|- 1 for all / G dHd 

and 

£eA\/ B'))>£eA\/ B'f) + 6A 

fede fede 

• Step 2. We replace B'j: with B'j for all / G dHd. If (12) fails (i.e. (14) 
holds), then we halt the algorithm (we are in the “structure” half of the 
dichotomy). Otherwise, we return to Step 1. 
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Observe that every time we return from Step 2 to Step 1, the quantity 

E E V ^/) 

eGHd EeGBe f GOe 

increases by at least . On the other hand, if this quantity ever increases by more 
than |i?d|2^ = 0|j|,m.e(l)) then by (2) and the pigeonhole principle (12) will 

necessarily fail. Since we only return to Step 1 when (12) holds, we see that the 
algorithm can only iterate at most 0|j|,m,e,<5(l) times. Thus when we terminate we 
must have (15). The claim then folows. ■ 

We now iterate Lemma 2.5 to obtain the following preliminary regularity lemma. 
Define a growth function to be an increasing function F : ^ such that 

F{x) > 1 + a; for all x. 

Lemma 2.6 (Preliminary regularity lemma). Let V = (J, (Vj)j^j, d, Hd) be a hy¬ 
pergraph system. For each e € Fid let Be Q Ae be a a-algebra, and suppose that we 
have the bound 

complex(;Be) < m for all e € Hd 

for some m > 0. Let e > 0, and let F be a growth function (possibly depending on 
e). Then there exists M > 0, and for each f € dHd there exists a pair of a-algebras 


Bf Q B'j: Q Af such that we have the estimates 

F(m) <M< 0|j|,e.™.E(l) (16) 

complex(13/) < M for all f G dHd (17) 

V I3'f) -BeA\/ Bf) < for all e G Hd, Ee G Be (18) 

/eOe /GOe 

^e{Ee\ V B'f) < for all e G Hd, Ee G Be (19) 

f&de '' 


Remark 2.7. Lemma 2.6 provides a coarse low-order approximation {Bf) fedHd and 
a fine low-order approximation {B'^f^dUd to the high-order cr-algebras {Be)eeHd- 
The coarse approximation has bounded complexity, the fine approximation is close 
to the coarse approximation in an sense, and the high order a-algebras are 
pseudorandom with respect to the fine approximation. The key point here is that 
the discrepancy control on the fine approximation given by (19) is superior to 
the complexity control on the coarse approximation given by (17) by an arbitrary 
growth function F. If one were to try to use a single approximation instead of 
a pair of coarse and fine approximations, it appears impossible to obtain such a 
crucial gain. 


Proof We perform the following iteration. 

• Step 0. Initialize Bf = {%, Vj} to be the trivial cr-algebra for all / G dHd, 
thus Bf has complexity 0 initially. 

• Step 1. Set M := max(F(m), supjgg^^ complex(i3)r)), and 6 := \/F{M). 
We apply Lemma 2.5, and end up in either the randomness or structure 
half of the dichotomy. In either case we generate ct- algebras Bt F Bb F Af 

tor»ch/eaH,. S / 
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• Step 2. If we are in the randomness half of the dichotomy, we terminate 
the algorithm. Otherwise, if we are in the structure half of the dichotomy, 
we replace Bf with for each / G dHd^ and return to Step 1. 

Observe that every time we return from Step 2 to Step 1, the quantity 

E E V ^/) 

eGHd EeGBe f^Oe 

increases by at least On the other hand, this quantity is non-negative and does 
not exceed \Hd\2‘^ = thanks to (2). Thus this algorithm terminates 

after 0|j|_m,e(l) steps. By (15), we see that at each of these steps, the quantity M 
increases to be at most M + while initially M is equal to F{m). 

Thus at the end of the algorithm we have (16) as desired. The remaining claims 
(17), (18), (19) follow from construction (and (12), (13)). ■ 

Remark 2.8. Lemma 2.6 already implies the Szemeredi regularity lemma in its 
usual form (and with the usual tower-exponential bounds); see [28] for further 
discussion. The above lemma is also similar in spirit to the modern regularity 
lemmas that appear for instance in [17] (except for an issue of obtaining regularity 
at all orders less than d, which we shall address in Lemma 2.9 below). In such 
lemmas, the objective is not to obtain a partition for which the original graph 
or hypergraph is regular, but instead to obtain a partition for which a modified 
graph or hypergraph is very regular, where the modihcation consists of adding or 
subtracting a small number of edges. The analogue of such a modihcation in our 
context is the decomposition 

ISe “ T^regular T Tsmall 

where 

^regular := E(l£;J \/ 13/) + (Ifi, - E(lsJ \/ B'f)) 

fede fede 

and 

F,m.n:=mEj V ^/)-E( 1 fJ V ^/)- 

fede fede 

The function Fsmaii is small thanks to (18) and (9). Now consider Tl-eguiar- On a 
typical atom of \J Bf, the hrst term is constant, and the second term is going 
to be very pseudorandom (have small correlation with sets of the form Clf^de^f 
for Ef G Af) thanks to (19) and (8). 


Lemma 2.6 regularizes the cr-algebras Be on the d-uniform hypergraph Fid in terms 
of cr-algebras Bf, B'j on the {d — l)-uniform hypergraph dHd- However it does not 
regularize the cr-algebras on dHd- This can be accomplished by one hnal iteration, 
which gives our hnal regularity lemma (which is essentially the same lemma^ as 
that in [11], [19], or [17]). 

^In contrast, the earlier regularity lemmas of Chung [2] and Frankl-Rodl [4] are closer to Lemma 
2.6, with dHd generalized to d^Hd for any fixed 1. The case I = d — 1 in particular is essentially 
a routine generalization of the ordinary regularity lemma and appears to have been folklore for 
quite some time. 
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Lemma 2.9 (Full regularity lemma). Let V = {J,{Vj)j^j,d,Hd) be a hypergraph 
system, and define the j-uniform hypergraphs Hj for all 0 < j < d recursively 
backwards from j = d by the formula Hj := dHj+i. (In particular, if Hd is non¬ 
empty then we have Hq = {0}-j For all e S Hd let Be C Ae be a a-algebra, and 
suppose that we have the bound 

complex(;Be) < Md for all e € Hd 

for some Md > 0. Let F be a growth function. Then there exists numbers 
Md < F{Md) < Md-i < F{Md-i) < ■ • ■ < Mq < F{Mo) < 0|j|.Md.F(l) 

( 20 ) 

and for each 0 < j < d and f S Hj there exist a-algebras Bf C B'j C Af, such that 
we have the estimates 

complex(;B/) < Mj for all 0 < j < d, f € Hj (21) 

BeA V B'f) -BeA\/ Bf) < .2 for alll<j<d,e€ Hj,Ee e Be 

fede fede '' F ( 22 ) 

Ae{Ee\ y B'f) < \ for all 1 < j < d,e e Hj,Ee G Be- 

feoe (23) 

Remark 2.10. At every order 0 < j < d, Lemma 2.9 gives coarse and fine ap¬ 
proximations (Bf) fGHj_i, (B'f) feHj_i at the (j — l)-uniform level to the a-algebras 
(B'efe^Hj at the j-uniform level. As one goes down in order, the a-algebras rapidly 
become more complex® (though lower order, of course). However, the bounds in 
(22) and (23) will keep apace with this growth in complexity (see [17] for some 
related discussion concerning the desirability of having the constants grow along 
such a hierarchy). Indeed the bound (23) is extremely strong, as F{Mfi) domi¬ 
nates all the other quantities which appear in the above lemma; it is effectively as 
if the fine approximation was perfectly accurate (so that Ie,, is approximable by 
E(l£;el V/eSe'®/) with only negligible error). The main remaining difficulty when 
using this lemma is to exploit the estimate (22) measuring the gap between the 
coarse and fine approximations; one has to take some care here because the error 
bound \/E{Mj)‘^ here safely exceeds the complexity® of the higher-order objects 
{Be)eGHj, but not that of the lower-order objects {Be)eGHj-i- 


Proof We induct on d (keeping J fixed); the implicit constants in (20) will change 
when one does this, but the induction will only run for at most jJj steps and so 
this will not cause a difficulty. When d = 0 the claim is trivial (and the claim (21) 
has an enormous amount of room available!) so assume that d > 1 and the claim 
has already been proven for all smaller d. We will need a growth function to 
be chosen later; as the name suggests, this function will grow substantially faster 
than F, in particular we assume F^®'®*(n) > F{n) for all n. Applying Lemma 2.6 
with m equal to Md, with £ equal to l/F{Md), and the growth function we 

®At the zeroth order j = 0, all (j-algebras have complexity zero, but this is a degenerate 
exception to the above general rule. 

®We will only need to bound the complexity of the coarse algebras Be- Some (very weak) 
bounds on the complexity of the fine algebras are available but they seem to be useless for 
applications and so we have not stated them explicitly here. 
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can create cr-algebras Bf B'j ^ Aj for all / G Hd-i and a quantity Md-i such 
that 


F{Md) < F^^\Md) < Md-i < 0|j|,e 

complex(i3/) < Md-i for all / G Hd-i 
£eA V ^e) -SeA\/ Bf) < J7^ for all e G Hd, E, G 

f&de f£de ' 

Ae(i?e| V ^/) < F^.st^Md-,) ^ ^ 


(24) 


(25) 


Now we apply the induction hypothesis with d replaced by d — 1, and Fid replaced 
by Hd-i- This generates numbers 

Md-i < F{Md-i) < ... < Mo < F{Mo) < 0|j|.m,_i,f(1) (26) 

and for each 0 < j < d — 1 and / G Hj there exist cr-algebras Bf C B'j C Af, such 
that we have the estimates 


complex)^/) < Mj for all 0 < j < d — 1, / G Hj 
BeA V ^e) -BeA\/ Bf) < ^ for all 1 < j < d - 1, e G Hj,Ee G Be 

fade fade 

Ae(£;e| W B'f) < ^ for all 1 < j < d - 1, e G Hj,Ee G Be- 

fade ^ 

Comparing this with the conclusion of Lemma 2.9, we see that we can obtain all 
the claims we need except for (23) when j = d, as well as the final bound in (20). 
To obtain (23), we see from (25) that it would suffice to ensure that 

E^^'^\Md-i) > i^(Mo). 

But since F{Mq) = ^(1), this can be achieved simply by choosing the 

growth function sufficiently large and rapidly increasing depending on 

F and | J|. By (26), (24), we then have 

F{Mo) = 0|j|_Md_.i,F(l) = 0|j|_Md,F,Ff“‘(l) = 0|j|^Md,F(l) 
and the claim (20) follows. ■ 


Remark 2.11. The dependence of constants here is quite terrible. Typically F will 
be an exponential function. In the graph case d = 2 one can take Mq to be a 
tower of exponentials, whose height is bounded by some polynomial of F{M 2 ); a 
modification of the arguments in [9] shows that this tower bound is essentially best 
possible. However, for d = 3, both Mq and Mi will be an iterated tower of expo¬ 
nentials of iterated height equal to a polynomial in F{MA), basically because of the 
need for ^fast exceed the bounds one obtains from the d = 2 case. The situation 
of course gets even worse for larger values of d, though for any fixed d the bounds 
are still primitive recursive. As stated earlier, the complexity bounds for the Hne 
approximations B'^ will be even worse than this, perhaps by yet another layer of 
iteration. Nevertheless, this regularity lemma is still sufficient for applications in 
which one is willing to have quantitative control only on the error terms (e.g. o(l) 
type bounds) rather than quantitative control. (As we shall see in [29], obtaining 
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infinitely many constellations in the Gaussian primes will be one such application.) 
In view of recent results on effective bounds on Szemeredi-type theorems (see e.g. 
[10], [23]) it seems quite possible that these very rapid bounds, while perhaps neces¬ 
sary in order to have a regularity lemma, are not needed for the hypergraph removal 
lemma. 


3. Statement oe counting lemma 

As is customary in these arguments, the regularity lemma must be complemented 
with a counting lemma in order for it to be applicable to proving results such as 
Theorem 1.13. In the a-algebra language, the setup is as follows. Suppose we 
start with a-algebras {Be)eeHd 9-s in the hypotheses of Lemma 2.9. Then, among 
other things, this lemma yields further cr-algebras {Be)eeHj for 0 < j < d, each of 
which has some complexity bound. Combining all of these cr-algebras together, one 
obtains a somewhat large (but still bounded complexity) cr-algebra Veeff where 
^ ■— Uo<j<d particular, if Eg are sets in Be for all e S Hd, then HeeLfd 

is the union of atoms in VesLf Here, of course, an atom of a cr-algebra B is 
a non-empty set in B of minimal size; since the ambient space Vj is finite, every 
point is contained in exactly one atom of B. 

Roughly speaking, the counting lemma we give below (Lemma 3.4) gives a formula 
for computing the probability of atoms in VeeLf ^e, or at least those atoms which 
are “good”. It can be informally described as follows. For each e € H, let Ag be 
an atom of Bg, thus HeeLf atom of VesLf ^ (if it is non-empty). The 

counting lemma then says that under most circumstances we have the approximate 
formula^® 

E(n n ^/) (2?) 

eGH eGH f&de 

where we use E(/|A) to denote the conditional expectation 

' ' xGA 

This can be viewed as an assertion that higher order atoms Ag are approximately 
independent of each other, conditioning on lower order atoms Af , although a precise 
formulation of this heuristic is somewhat difficult to quantify. In particular, if we 
remove those “bad” atoms neeff for which E(lyi^ | H/eOe H/) is small for at least 
one e G H, then all the remaining non-empty atoms will have fairly large size. Thus 
if the set HeeLf has very small size, then after removing all the bad atoms we 
expect this set to in fact be empty. This is the strategy behind proving Theorem 
1.13. 

We now formalize the above discussion. We begin by describing the good atoms. 
Informally speaking, the good atoms are going to be those which are fairly large (at 

^*^The reader may wish to interpret E(lyi) as being the “probability” of the “event” A, thus 
for instance E(]4gg^ lAe) th® probability of the joint event rieeR Similarly, many of the 
arguments in the sequel also have a strongly probabilistic flavour. 
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all orders) and also fairly regular (at all orders). This is consistent with previous 
experience with counting lemmas (say in the graph case), in which one must first 
throw away all cells of the partition which are too small (or have too few edges), 
as well as all pairs of cells for which the graph is irregular, before one can obtain a 
useful estimate for (say) the number of triangles in a graph. 

Definition 3.1 (Good atoms). Let the notation, assumptions, and conclusions be 
as in Lemma 2.9, and let H := Heeff ^ (possibly empty) 

atom of where for each e G H, Ae is an atom of Be- We say that this 

atom is good if for all 0 < j < d and e G Hj we have the largeness estimates 

e(ia. n 1^/) 

fede ® ^ fGde 

as well as the regularity estimates 

e(|e(uj V ^})-E(uj V «/)rni^/i ^7Yi^E(niA 

V f&de fade fCe ) '' fCe 

Remark 3.2. While the definition of a good atom allows for Heeff empty, 

the counting lemma we prove below will show that in fact good atoms are always 
non-empty (assuming F is sufficiently rapid). The reader should not take the 
logarithmic factor in (28) too seriously; the point is that logT'(Mj) is smaller than 
any power of F{Mj) but still much larger than any given function of Mj. 


(28) 


)• 

(29) 


One can easily verify that most atoms are good in the following sense. For any 
0 < j < d, e S iLj, and any atom Ae of Be, let Be,Ae be the union of all the sets 
Clfce^f which (28) or (29) fails. We remark for future reference that the set 
Be,Ae lies in V/Ce '®/- Note also that if the atom Heeff is not good, then there 
exists e € H such that He'etf H Be^A^- 

Lemma 3.3 (Most atoms are good). Let the notation, assumptions, and conclu¬ 
sions he as in Lemma 2.9 and Definition 3.1. For any 0 < j < d, e € Hj, and any 
atom Ae of Be, we have F,{1 a,,1b,,^a„) = 0{1/ \ogF{Mj)). 


Proof Consider the contribution to E(1^^1 b^^^) from the case where (28) fails. 
This contribution is bounded by^^ 

E E(u, n 1^/) 

(A/)/e8eatoms in (Z 3 /)ae:( 28 ) fails /G9e 


which by failure of (28) is bounded by 

- ^ logF(M^)®^n 

(A/)/e8eatoms in (Z3/)ae f&de 


1 

logE(M,)- 


Next, consider the contribution to E(1^^1 b^^ ) arising from the case when (29) 
fails. The total contribution of this case is 


E E(niA,) 

(^/)/Ce:(29) fails /Ce 


^^Note that (28) depends only on those Af for which / € Oe, as opposed to the larger class of 
events Af for which / C e. 
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which by failure of (29) is at most 

F{M,) EI |E(uj v V 

{Af)fCe \ fede fGde /Ce 

which in turn is at most 

E(M,)e(|E(UJ V B'f)-E{lAj V 

\ fede fede 

But by (9), (22) we have 

e(|e(U.| V Ei)-E(L.I V 8 /)n S 

V fede /eoe J 

Combining all of these estimates, the claim follows. 


We can now state the counting lemma; closely related results appear in the work 
of Gowers [10], Nagle, Rodl, and Schacht [15], and Rodl and Schacht [17]. 

Lemma 3.4 (Counting lemma). Let the notation, assumptions, and conclusions 
be as in Lemma 2.9 and Definition 3.1, and let H := Uo<j<d rieeH 

be a good atom of \JBe. Then, if the growth function F is sufficiently rapid 
depending on jJj, we have that pjeeH non-empty, and more precisely 

E(n +'’M,i^oo;iji(i)) n n + ^\ j\,mo ( p(m i 

e€H eSH f€de ^ ^ 

(compare with (27)). 


This lemma is a little lengthy (though straightforward) to prove, and we defer it 
to the next section. Let us assume it for now, and conclude the proof of Theorem 
1.13. 

Proof [of Theorem 1.13 assuming Lemma 3.4] Let V = (J, {Vj)j^j, d, Ldf), {Ee)e£Hdi 
6 be as in Theorem 1.13. We define Hj recursively for 0 < j < d by setting 
Hj := dHj+i, and then set H := Uo<i<£i'^i- Eor any e G Hj, we set Be '■= B{Ee), 
thus each Be has complexity at most 1. Let Md > 1 be a quantity to be chosen 
later, and let F be a growth function depending on j Jj (but not on <5) to be chosen 
later. We apply the regularity lemma. Lemma 2.9, to obtain quantities (20) and 
cr-algebras Bf ^ B'j, Q Af for all / G id. 

Suppose that Heen is a (possibly empty) atom of Veen “ Ee 

for e G Hd. If this atom is good, then by the counting Lemma (Lemma 3.4) and 
Definition 3.1 we have 

E(lneeHAe) = (l + OM,i^OO;|J|(l)) H F{Mj)FW + ^ p ’ 
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if F is sufficiently rapid depending on |J|. Using (20), we thus see that (if Md is 
sufficiently large depending on J) 

E(ln,,„Aj>c(|Jl,Md,F) 

for some c(l J|, F) > 0. On the other hand, Heeff is contained in Hes-f/d 
which has density at most 6 by the hypothesis (3). Thus if 5 is sufficiently small 
depending on | J|, Md, F, we see that no atom Heeff e G Ffd 

can possibly be good. 

Now let Be,Ae be as in Lemma 3.3. Let us define 

Ei := Vj\{Be,E^ U U U^/ n 

fZeAf 


for all e S Fid, where for brevity we adopt the convention that is always under¬ 
stood to range over the atoms of Bf. Then we observe that U' G V/ce'®/- "bhe 
claims (6), (7) then follow from (21). Also, from Lemma 3.3, (21) we see that for 
any e G Fid, 

^{^Ee\E'J < E(l£;^l5^ eJ + y^y^E(lAjlBj,^^) 

/Ce Af 

< O(F(M,)-i/i0) + ^ ^ ^0(l/logU(M,)) 

0M<dfeHj Af 

< 0(F(M,)-1/1°) + ^ ^ OM,(l/logF(M,)) 

0<j<dfeHj 

< sup OM,,|j|(l/logE(Mj)). 

0<j<d 

If one chooses F sufficiently rapidly growing (depending only on | J|), we conclude 
from (20) that we have 

E(l£;^\£;') = OMd^O;|J|(l)- 

By choosing Md sufficiently large depending on | J|, and then letting S be sufficiently 
small depending on Md and | J|, we conclude (5). 

The final thing to verify is (4). To see this, first observe that this set lies in 
y f^H\Hd union of atoms of the form n/ei/\//d ^7’ S'iPPOse for 

contradiction that fleeffd contains a non-empty atom of the form C\f£H\Hd 
Set Ae := Ee for e G Hd- By the preceding discussion we know that Heei? 
cannot be good, thus there exists an /' G iL such that flgc/' bes in Bji^Af,- 
From construction of H, there exists e G Fid which contains /'. But then by 
definition of E^, f ^h\h d cannot he in E^, contradiction. T'hus A' is 

empty, which is (4), and Theorem 1.13 follows. ■ 


It remains to prove the counting lemma. This will be accomplished in the next 
section. 
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4. Proof of counting lemma 

We now prove Lemma 3.4. Fix a good collection {Ae)e£H of atoms. We introduce 
the numbers Pe S R, the functions be, Ce '■ Vj ^ R, and the sets A^e Q Vj for all 
e G H hy the formulae 

Pe-.= nuj n Af) 

fede 

6e:=E(Uj V ^})-E(Uj V Bf) 

fede fGde 

Ce :=U,-E(UJ V B'f) 

fede 

A<e := fl A/. 

Note that we have not yet shown that H/eOe^/ non-empty; for now, let us just 
assign an arbitrary value to Pe (e.g. Pe = 1) when H/eSe^/ i® empty. We thus 
have the decomposition 


IAs — Pe A be Ce ( 30 ) 

on the set H/eOe should think of the constant Pe as the main term, and 

the other two terms as error terms. The Ce error term will be very easy to handle, 
whereas the be error term will cause somewhat more difficulty. Since {Ae)e£H is 
good, we have the estimates 



Pe > 

1/ log F{Mj ) for all 0 < j < d and e G Hj 

(31) 

and 





E(|6ePlA<J 

< for all 0 < j < d and e G Flj. 

(32) 

From (23) and (8), we 

also have 



iE(ce n 

fede 

1e^)| < ^ whenever Ef G Af for / G de. 

(33) 


Our objective is to use the above estimates (30), (31), (32), (33) to conclude that 

E( lAe) = (1 + OMd^oo;|J|(l)) 11^6 + O^p^Moi x )• (34) 

This will be achieved by several applications of the Cauchy-Schwarz and triangle 
inequalities. However, there is a certain amount of notational burden in order to 
keep track of the expressions in the succesive applications of these inequalities. It 
will be convenient to return to the original sets (Vj)jgj. We can identify Ae G Be 
as a subset Ae of 14 = Yij^e similarly we can view the ^g-measurable be 
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and Ce as functions be and Ce on Ve- One can then write (34) in the form 

\V~\ X! n ^Ar(fe)i6e 

f 1 \ 

= (1 + OMd^oo;! J| (1)) W_Pe + 0| j|,Mo \ ' 

For inductive purposes we will need to generalized^ this formula. 

Definition 4.1 (Hypergraph bundle). A hypergraph bundle over H is a hypergraph 
G C 2*^ on a finite set K, together with a map tt : K ^ J (which we call the 
projection map of the bundle), which is a hypergraph homomorphism (i.e. for each 
edge g G G, the function tt is injective on g and TT{g) G H). For any g G K, we 
write Vg for the product set Vg := Ilfceg We say that the bundle is closed 

under set inclusion if whenever g G G and g' C g, we have g' G G. 

Remark 4.2. From a probabilistic viewpoint, the probability space Vj corresponds 
to sampling one vertex independently from each of the vertex classes Vj of Vj, 
whereas the more general spaces Vg correspond to the possibility of sampling more 
than one vertex independently from each of the vertex classes. 




The generalization of the formula (35) is then 


Lemma 4.3 (Generalized counting lemma). Let G C 2^ he a hypergraph bun¬ 
dle over H which is dosed under set inclusion, with projection map tt : K ^ J. 
Let d' := sup^g^ |g| be the order of G. Then, if F is sufficiently rapidly growing 
depending on d', | J| and |Ar|, we have 


Gkik^KeVK geG 

= (1 + OMd^oo;dMJ|,|Al(l)) n P^G) + Od',\J\,\K\,Mo 


(36) 


Observe that (35) is the special case of this lemma with G = H (and K = and 
TT being the identity map); note from construction of H that H is automatically 
closed under set inclusion. 


Proof We shall use a double induction. Firstly, we shall induct on the order d' of 
the bundle G. When = 0 the claim is vacuously true (the left-hand side and the 
main term of the right-hand side is equal to 1), so we may assume > 1 and the 
claim has already been proven for d^ — 1 and for all choices of hypergraph bundle 
G C2^ which are closed under set inclusion. 

^^The basic problem is that we need the Cauchy-Schwarz inequality to eliminate each of the be 
factors in turn (using (32)), but each time we apply this inequality we essentially double the num¬ 
ber of free variables that one has to sum or average over. In particular, one ends up sampling more 
than one point from each vertex class Vj , which forces us to leave the probabilistic framework that 
has been so convenient for us in preceding sections and return to a combinatorial framework. One 
could stay in the probabilistic framework using the machinery of tensor products (and conditional 
tensor products) of probability spaces, but this would introduce even more excessive notation into 
an already notation-heavy argument and would probably not be helpful to the reader. 
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Next, we fix K and induct on the quantity r := \{g G G : \g\ = d'}\, which is a 
positive integer between 1 and We thus assume that the claim has already 

been proven for all smaller values of r (note that for r = 0 this follows from the 
previous induction hypothesis). The constants may change as we progress in this 
induction, but since the number of steps in the induction cannot exceed 21*^1, this 
will not be a concern. 

Let go G G he such that |go| = d'. We use (30) to split 

n ((^fe)feGg) = 

geG 

n {{'^k)keg) {Pirigo) bTT{go)ii.'^k)kego) ~h Cjr(go){{.'>Jk)kego)) 

_geG\{go} 

and consider the contribution of the three terms separately. 

We first consider the contribution of the PTv{g) term, which is the main term. Ap¬ 
plying the second induction hypothesis to G\{(7o} we see from (36) that 

{vk)keK^yK 9^G\{go} 

= (l + OM,i^oo;dMJ|.|iC|(l)) n P^ig)+Od\\J\,\K\,Mo[TuT^j- 

geG\{go} V 1 o;/ 

Multiplying this by the quantity P-n-igo), which is between 0 and 1, we see that the 
contribution of this term to (36) is 

{1 + OM^^oo-,d',\J\,\K\{^)) Wp^G) + Od',\Jl\K\,MQi-^y;^)- ( 37 ) 

Next we consider the term. We split Vk = Vg^ x VK\gg ■ Let us temporarily 

freeze the values of Vk for k G K\go, and consider the expression 

^ E n lA, (g) (('^fe)fe6s) '^7r(go) (('^fc)fcG9o) • 

GGkGgo^Vgg _geG\{go} 

Observe that for each g G G\{go}, we have g ^ go and |(/| < d' = |(7o|- Thus gGgo 
is a proper subset of go^ and thus there exists an element of dgo which contains 
g n go- Thus one can rewrite the product IlgeGXtgo} ( ) (('^fc)fceg) ia the form 

n ^Ef{{vk)ke^U)) 

fed go 

for some sets Ef CVf whose exact form is not important here (we allow the Ef to 
depend on the frozen Vk)- Applying (33), we conclude that 

^ E n (('Cfe)feeg) C.n.(gp) ((pfc)feego) — 

Gk)kl=goeVgo _geG\{go} 
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Averaging this over all choices of the frozen variables k € K\go, we conclude that 
the contribution of this term to (36) is at most 

l/F(Mo). (38) 


Finally we consider the contribution of the 67 r(go) term, which is the most difficult 
from a notational viewpoint to handle, mainly because of the need to invoke the 
Cauchy-Schwarz inequality. We expand this contribution as 




E 


(vk)kGK€VK 




keg) 


geG\{go} 


^■^(go) ii'^k)kego) ■ 


We take absolute values and discard^^ the bounded factors l-j—^{{vk)k£g) with 
|g| = d', to estimate this expression by 


O 



E 

(vk)k&K^VK 


n 




where Gcgo ■= {g ■ 9 ^ 3 o} and G' := 
this as 




^-^{go) ii'^k)kego) I 


{g S G\G(Zg„ : l^l < d' — 1}. We factorize 


O 


I^9C 


E 


Gk)k 

e 90 ^ ^50 




k^g) 


g^Gc 


^■^(go) ii'k’k)kego) I 


1 

\^K\go\ 


Gk)k<=K\ggeVK\gg geG' 



(39) 


On the other hand, from (32) we have 

n ^A::d^{i'^k)keg) 

g&Gcgg 

_ 2 1 

\K{go){{vk)k(^go)\ - F(Md/)^*'^^<-(!'o)^’ 

and hence by Cauchy-Schwarz we can estimate (39) by 


\Vg, 


E 


('*^fc)fcego ^^90 


^^This discarding step is important as it lowers the total order of the expression being com¬ 
puted, which compensates for a certain doubling of the hypergraph bundle which shall occur 
shortly when we apply Cauchy-Schwarz. We can get away with this step because the smallness 
of ^ 7 r( 5 o)’ given by (32), safely dominates any loss we absorb by discarding these high-order 
factors. 
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VVgo 


E 

(vk)k e 90 ^ ^30 


n ((^fe)fe 69 ) 


seGc 


n 2 


1 


\VK\g 


E n ((^fc)fce3)) 


C^fc)fceft:\go 

From the first induction hypothesis we have 

= (1 + OMd^c5o;d',|J|(l)) n PTT(g) + Oj;/_ 


1/2 (40) 


g^Gcg 


\J\,Mo (^F{Mo) 


and thus 


^*^^^<-(90)^ “ n P^(g))+Od',\j\,Mo (^f{Mo)) ■ 


Now we estimate the expression in parentheses in (40). As we shall see, this expres¬ 
sion can be rewritten in a form which can be handled by the induction hypothesis, 
but with the hypergraph bundle G replaced by a hypergraph of approximately twice 
the size (roughly speaking, we throw away all edges of top order d', and double all 
the remaining edges that are not contained in Gcgo)- It is this doubling which 
forces us to work with a generalized counting lemma^^ rather than the original 
counting lemma. 


Let K = K 0gQ K be the set AT x {0,1}, with the elements {k, 0) and (fc, 1) identified 
for all k G go- There is an obvious projection (j) : K K, and hence a map 
TT o (j) : K ^ H. On K we also place a hypergraph bundle G, defined as the set 
{(/ X {/} : g G Gcgo U * G 1) 2}; note that g x {0} and g x {1} will be identified 
when g G Gcgo ■ From the definitions we observe that 


|C,0 


E 

{Vk)k e 90 ^ ^30 






E n {{Vk)k^g) 


^ Efc)fc6K\so^Of\so 9GG' 




^ ^ f f ^^ 7 ro<^(g) ■ 

Gk'fkGit^^K g&G 


r((^ 


Applying the first induction hypothesis, we can write this expression as 

1 


(1 + OMa^aD-,d',\J\,\K\i^)) P^°<P(.g) + ^^d',\J\,\K\,Mo 

g&G 


F{Mo)J ■ 


(42) 


^"^There is a possible alternate approach which avoids the Cauchy-Schwarz inequality, and hence 
the need to work with hypergraph bundles. One can attempt to use the lower-order induction 
hypothesis to show some uniform distribution properties concerning the intersections of the lower- 
order atoms with each other, in order that the contribution of the bgQ error be shown to be 
negligible. A model example of such a statement, in the graph setting, would be the assertion 
that in an ^-regular graph H, the number of copies of a fixed small graph G in H, with one edge 
specified to be (x, y), is usually close to a fixed quantity independent of x and y, except for a small 
number of exceptional pairs {x,y). We will not pursue such an alternate approach here. 
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By the definition of G, we can write 

rr ~ rr ^■^(5) ^ i n 

geG seGcgo SGG' 

and thus by (31) and (20) we can rewrite (42) as 

Inserting this and (41) back into (40), we can estimate (40) by 

OMa,d',\J\,\K\ I n P^(9) n P^(9) I +Od',\J\,\K\,Mo ' 

\ 9^Gcgg g^G' J 

Re-inserting those elements g oi G for which |g| = d' using (31), we can estimate 
this by 

1 


OMd 4 ',\J\,\K\ 

( 

If P-^G) 

If P^^ig) 

1 



_geG' 



1 


OMd,d',\J\,\K\{F(.Md') Pi^ig)) + Od\\j\,\K\,Mo 

geG 


F(Mo) 


(for instance). By choosing F sufficiently rapid depending on d', | J|, lifl, we can 
write this as 

OMd^oo;d',\J\,\K\{Y[p7,{g)) + Od',\J\,]K\,Mo 

qGG 

Combining this with the bounds (37), (38) we obtain (36), which closes the induc¬ 
tion. This completes the proof of Lemma 4.3, and hence Lemma 3.4. ■ 
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